A Faster Compressed Suffix Trees for Repetitive Collections

نویسندگان

  • Gonzalo Navarro
  • Alberto Ordóñez
چکیده

Recent compressed suffix trees targeted to highly repetitive sequence collections reach excellent compression performance, but operation times are very high. We design a new suffix tree representation for this scenario that still achieves very low space usage, only slightly larger than the best previous one, but supports the operations orders of magnitude faster. Our suffix tree is still orders of magnitude slower than generalpurpose compressed suffix trees, but these use several times more space when the collection is repetitive. Our main novelty is a practical grammar-compressed tree representation with full navigation functionality, which is useful in all applications where large trees with repetitive topology must be represented.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Faster Compressed Suffix Trees for Repetitive Text Collections

Recent compressed suffix trees targeted to highly repetitive text collections reach excellent compression performance, but operation times in the order of milliseconds. We design a new suffix tree representation for this scenario that still achieves very low space usage, only slightly larger than the best previous one, but supports the operations within microseconds. This puts the data structur...

متن کامل

Relative Suffix Trees

Suffix trees are one of the most versatile data structures in stringology, with many applications in bioinformatics. Their main drawback is their size, which can be tens of times larger than the input sequence. Much effort has been put into reducing the space usage, leading ultimately to compressed suffix trees. These compressed data structures can efficiently simulate the suffix tree, while us...

متن کامل

Compressed Suffix Trees for Repetitive Texts

We design a new compressed suffix tree specifically tailored to highly repetitive text collections. This is particularly useful for sequence analysis on large collections of genomes of the close species. We build on an existing compressed suffix tree that applies statistical compression, and modify it so that it works on the grammar-compressed version of the longest common prefix array, whose d...

متن کامل

Practical Compressed Suffix Trees

The suffix tree is an extremely important data structure in bioinformatics. Classical implementations require much space, which renders them useless to handle large sequence collections. Recent research has obtained various compressed representations for suffix trees, with widely different space-time tradeoffs. In this paper we show how the use of range min-max trees yields novel representation...

متن کامل

A compressed dynamic self-index for highly repetitive text collections

We present a novel compressed dynamic self-index for highly repetitive text collections. Signature encoding, an existing self-index of this type, has a large disadvantage of slow pattern search for short patterns. We obtain faster pattern search by leveraging the idea behind a truncated suffix tree (TST) to develop the first compressed dynamic self-index, called the TST-index, that supports not...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015